perm filename ACM.1[TLK,DBL] blob
sn#199368 filedate 1976-01-30 generic text, type T, neo UTF8
INTRODUCTION (1 min)
I'm going to talk today about the process of discovery and invention
in empirical science. My thesis, which I'll describe in a few
minutes, is a first step toward automating theory formation in an
elementary subset of mathematics.
The first thing I want to stress is the difference between solving a
given problem and thinking it up in the first place. Contrast
playing monopoly, to inventing new games of the same quality. Or
contrast solving the Missionaries and Cannibals problem, with the
ill-defined task of formulating it. If I give you the definition of
prime numbers, no doubt you can find some for me, but how could you
have been motivated to propose that definition yourself?
EXPLAINING THE DISCOVERY OF Primes (1 min)
This leads us to wonder how we could explain a discovery, like prime
numbers. Why did anybody decide they were worth investigating, worth
giving a name to? Here's one plausible scenario for their discovery:
A researcher has just discovered the concepts of "divisors-of" a
number. He knows that very often it is worthwhile to examine extreme
cases, so he poses the question "Which numbers are minimally
divisible, have as few divisors as possible?" He formulates the set
of numbers with precisely one divisor; this is just the singleton set
(1). It's too small and already well-understood. Well, what about
numbers with precisely 2 divisors. That turns out to be the prime
numbers. Soon, he discovers unique factorization or some other
important relationship involving these numbers, and decides they're
interesting and worth naming.
PROBLEM-REDUCTION (2 min)
What have we really done here? We reduced the problem of explaining
the discovery of prime numbers, to the simpler task of explaining the
discovery of "divisors-of". The reduction was accomplished by citing
a heuristic, which said to investigate which items give extreme
results under some function. This is a pretty general heuristic.
We could continue this process, reducing this task to the discovery
of multiplication, by citing a heuristic which said to always
investigate the inverse of interesting relations. This process can be
continued, until we've reduced our original task to the discovery of
concepts we consider to be primitive.
These reductions build up a chain of concepts, stretching from the
given discovery backwards, all the way to known concepts. Each link
in the chain is justified by some more or less general heuristic.
The chain is a plausible scenario for discovering this concept. When
we hear of a new discovery, we mentally try to create this chain
ourselves; if we succeed easily, then we might adopt the attitude
that the discovery wasn't so hard, that we could have done it
ourselves.
INVERTING THIS: TREE-GROWING (3.5 min) numerical model
Of course, the process of invention proceeds in this direction, not
this one. If our tree-searching model is at all accurate, the
researcher finds himself with a large base of existing knowledge, and
several heuristic operators which suggest things for him to
investigate. Let's do a trivial algorithmic analysis of this
situation. Say there are h of these operators, and k concepts, then
the researcher might be staring at something like hk possible avenues
for research. Notice how vicious this combinatorial explosion is.
There are (h↑2)k concepts to consider after a two-step search. Since
investigating any one of these concepts might take a considerable
period of time, this process has a low success rate.
With this same simple model, let's look again at the analysis of a
given discovery. Well, there are only h things that might have led
to this, and h that led to that, so we are comparing h↑2 with (h↑2)k.
Typically, an expert will know thousands of concepts (k=2500) and
tens of kinds of operators to suggest new things to look at (h=20).
So discovering some concept is like searching a space of size
(20↑2)x2500=1,000,000. But analyzing that discovery only requires a
search of 20↑2 = 400 things.
So producing a discovery is quite a bit harder than rationalizing how
you might have done it, because in the analysis process you at least
know one end-point in this 2-step chain. This explains why
discoveries sometimes seem so obvious after the fact, even to their
inventor, who may have labored years on the problem.
If we introduce the use of heuristic strategies in our model, these
numbers shrink dramatically. The total number of potential s-step
discoveries is (h↑s)k. Strategies have two effects on this formula.
If a strategy suggests which operations will be the most fruitful
to apply, from any given node, that effectively reduces the size of
h. If a strategy tells you to focus in on specific nodes which seem
promising, (and to ignore all the rest), that effectively reduces n.
A good researcher must know -- and use -- heuristic strategies
which produce both these effects.
MY THESIS: AM (1 min)
I'm saying that theory formation can be considered a heuristic
search: That is, a continual expansion of a tree of concepts and
relationships, guided by some judgmental criteria, some heuristic
rules of thumb.
If this is true, one should be able to write a computer program which
proposes new concepts in some scientific domain. After picking the
field in which the system will work, there are 3 things you'd have to
do:
(1) specify the initial core of knowledge that it will start with,
(2) specify the set of operators which indicate all the legal tasks
one might do to any given concept to expand that knowledge, and
(3) collect the heuristics and strategies that experts in that
field use when trying to form theories.
This is what I've tried to do for some elementary branches of
mathematics. Since few of you have any professional interest in set
theory or number theory, I'll limit myself to a few general remarks.
SLOSH TIME: 30 seconds **************************************
ACTUAL IMPLEMENTATION: repr and control (2 min)
First I selected a hundred very fundamental concepts to let the
system begin with. There are a couple dozen different facets to each
concept, but I supplied only the definitions. All the system can do
to a concept is pick a blank facet and try to fill it in. So the
total number of possible activities that the system faces at any
moment is about hk, where there are k concepts and each one has about
h facets. The legal operators can be seen to correspond to the set
of facets.
I used introspection to construct a list of several hundred
heuristics. The final task was to decide on suitable representations
and algorithms.
Each concept has a property list indicating its known facets.
<slide: exs, defns, domain/range, generalizations,...>
We can treat the relevant heuristics as just another facet of each
concept. Each heuristic
or strategy will be written as a simple production rule. When the
system is working on some particular activity, all the relevant
heuristics will be thrown together into one production system and
run.
What does the system actually do, now? It expands its knowledge, both
by finding out more about the already-known concepts, and
occasionally by defining new ones. That means it fills out the
properties of existing data stuctures, and sometimes creates new
ones.
The flow of control is governed by a job-list, a sequence of
suggested activities to do, an ordered list of candidates for the
system's attention. Each candidate job is proposed by a heuristic.
It specifies that a plausible thing to do is to fill out a certain
facet of a certain concept. The job is also tagged with a list of
definite reasons for why it might be worth doing. Based on those
reasons, the job is assigned a priority rating from 0 to 1000.
A very crucial mechanism comes into play if the newly-proposed job
was already in the job-list. In such cases, the list of reasons are
examined carefully. If the job is being proposed for pretty much the
same reasons as before, then its priority rating is incremented
slightly. But if it is being proposed for a new reason, then its
priority is increased tremendously.
The basic flow of control is quite simple. The system looks over the
job-list, and tries to execute the job with the highest priority.
Its priority rating indicates the number of milliseconds to spend
before giving up and trying the next one. In the process of
performing the job, new jobs may be suggested, new concepts created,
blank facets filled in, etc.
Well, in any event, that was how AM was designed and created. You now
know all the essentials of its representations and algorithms.
PERFORMANCE and problems (1.5 min)
The system started with 100 prenumerical concepts (k=100), and about
half of the 100 new concepts it proposed were interesting. Many of
them were several levels deep (s=7), for example the notions of
numbers, multiplication, divisors, prime numbers, unique
factorization. The heuristics guided the system to only look at 200
nodes in a space of size (h↑7)k [write: (25↑7) x 100] which is about
6 billion. This whole development only used an hour or so of cpu
time. The system is written in INTERLISP.
The result I'm proudest of is the discovery of an entirely new
theory: a new way to characterize numbers which have an abnormally
large number of divisors.
The major problem is providing heuristics which are powerful enough
to recognize quickly whether each new concept is going to be
interesting or a loser.
Of course, the standard problem in our business is the tremendous
gap between plausible English prose that will convince a human, and
debugged code that will convince a computer. When talking to you,
it's fine for me to say that a function is interesting if its domain
and range sets coincide; but for my system, I have to specify exactly
how to calculate that interest value. The precision I need to
instruct the machine still catches me unaware sometimes, and I find
myself underestimating the time I'll need to program some idea I was
sure would be a cinch.
This problem can be summed up as "Just because you can talk about
something doesn't mean you understand it". There is no solution to
this. It is one reason that we actually write these AI programs. They
force us to make our ideas precise. This implies that we can talk
about more than we really understand. Which suggests to me that it's
about time for me to stop talking.
QUESTIONS (2 min)